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Abstract: The classical method of quantitative structure-activity relationships (QSAR) is 
enriched using non- linear models, as Thorn's polynomials allow either uni- or bi-variate 
structural parameters. In this context, catastrophe QSAR algorithms are applied to the 
anti-HIV-1 activity of pyridinone derivatives. This requires calculation of the so-called 
relative statistical power and of its minimum principle in various QSAR models. A new 
index, known as a statistical relative power, is constructed as an Euclidian measure for the 
combined ratio of the Pearson correlation to algebraic correlation, with normalized 
t-Student and the Fisher tests. First and second order inter-model paths are considered for 
mono-variate catastrophes, whereas for bi-variate catastrophes the direct minimum path is 
provided, allowing the QSAR models to be tested for predictive purposes. At this stage, the 
max-to-min hierarchies of the tested models allow the interaction mechanism to be 
identified using structural parameter succession and the typical catastrophes involved. 
Minimized differences between these catastrophe models in the common structurally 
influential domains that span both the trial and tested compounds identify the "optimal 
molecular structural domains" and the molecules with the best output with respect to the 
modeled activity, which in this case is human immunodeficiency virus type 1 HIV-1 
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inhibition. The best molecules are characterized by hydrophobic interactions with the 
HIV-1 p66 subunit protein, and they concur with those identified in other 3D-QSAR 
analyses. Moreover, the importance of aromatic ring stacking interactions for increasing the 
binding affinity of the inhibitor-reverse transcriptase ligand-substrate complex is highlighted. 

Keywords: Thorn's catastrophe polynomials; statistical factors; minimum statistical paths; 
QSAR structural domains; HIV-1 inhibitory activity 



1. Introduction 

Among the mathematical theories that model open-system dynamics, Thorn's theory of catastrophes 
has acquired much popularity for its simple yet valuable description of the system-environment 
interaction that includes phenomena such as steady state equilibrium and life cycles [1]. In particular, 
biological systems come first under catastrophe modeling because they display a causal action-reaction 
response to various natural or imposed constraining limits. As an example, the reactions of organisms 
to vital toxicological threats were developed into the survival attractor concept by employing butterfly 
bifurcation phenomenology, which is closely related to the cusp catastrophe, thus revealing the close 
connection with the turning points around singularity points of the fundamental central field laws of 
attraction [2]. The cusp catastrophe was further implemented in the physiological processes of 
predation and generation, thus giving mathematical support to Heidegger's philosophical concept of 
entity and having the major consequence of translating the ontological entities into computer 
language [3]. Following this line of application, Jungian psychology entered the topological 
approach phase through modeling personal unconscious and conscious states using the swallowtail 
catastrophe [4]. As a consequence, neuro-self-organization was advanced by reduction to cusp 
synergetics as an archetypal precursor of epileptic seizures [5]. Nevertheless, in chemistry the 
catastrophe approach enters through the need to unitarily characterize elementary processes such as 
chemical bonding, leading to the so-called bonding evolution theory and reformulation of the electronic 
localization functions [6,7]. In the last decade, catastrophe theory was successfully grounded on 
Hilbert space modeling with the density matrix and non-linear evolution as specific tools for the 
non-commutative (quantum) systems [8]. At this point, the interesting connection with the linear 
superposition of quantum states may be generalized in a non-linear manner with direct correspondence 
for widespread quantitative structure-activity relationship (QSAR) treatments of the "birth and death 
of an organism". 

In this context, the present contribution provides in silico assistance to clinical efforts in current 
antiretroviral therapy by contributing to the development of a given class of actual anti-HIV-1 
compounds and identifying their viral inhibitory mechanisms and influential structural factors. 
Continuous efforts both in theory and in clinical practice are made to provide new and valid data for 
HIV infection management. Note that acquired immunodeficiency deficiency syndrome (AIDS) was 
first recognized in 1981. Only 25 compounds have been approved for use in HIV infected patients, and 
they are distributed among several classes of antiretroviral drug types [9,10]: nucleoside reverse 
transcriptase inhibitors (NRTIs); nucleotide reverse transcriptase inhibitors (NtRTIs); non-nucleoside 
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reverse transcriptase inhibitors (NNRTIs); protease inhibitors (Pis); cell entry (or fusion) inhibitors 
(FIs); co-receptor inhibitors (CRIs); and integrase inhibitors (INIs). Among these, it is well known that 
most NNRTIs have a low genetic barrier to resistance, i.e., high viral resistance may be induced by a 
single mutation at the NNRTI binding site [1 1]. It is this particular feature that makes NNRTIs so well 
adapted for a comprehensive catastrophe theory application. Although NNRTIs are an open battlefield 
for research, being highly active in naive and drug-resistant HIV infected patients [12], QSAR 
methods are cost-effective approaches to developing new and potent molecules with increased 
anti-HIV activity [13-23]. As a viable alternative to the available 3D-QSARs, the present endeavor 
makes the first steps toward generalizing multi-linear QSAR to non-linear catastrophe QSAR analysis 
and toward providing a conceptual-computational framework in which both the interactions occurring 
between the pyridinone derivatives and the NNRTI binding site and the structural domains influential 
for HIV-1 RT inhibitory activity are accounted for [24]. 

2. Background Theories 

2.1. QSAR Phenomenology 

The fundamental problem of structure-activity analysis may be described as follows: given a 
congener set of iV-compounds/molecules with measured/observed activity (A) one searches for the best 
correlation of it with the structural (intrinsic, internal) molecular information quantified by 
M-properties (such as hydrophobicity, polarization, total energy), classically presented in multi-linear 
form [25-31]: 

Y = b 0 +b l X l +... + b k X k +... + b M X M (1) 

Equation (1) has some basic features, namely: 

• Y stands for the computed activity, not the observed activity, from the statistical characteristics of 
the present approach; thus the validation of Equation (1) should be done for another (preferably 
external or testing) set of compounds with which the predictive power of Equation (1) is tested. 

• Because the right side of Equation (1) unfolds as a linear summation of the structural 
characteristics, it corresponds in fact with the quantum superposition principle, which provides a 
global Eigen-solution for a quantum system from its particular realization in orthogonal or 
projective sub-space; from where the need arises for structural indices X\, X M to be either 
linearly independent or orthogonal in algebraic space built from their associate vectors presented 
in Table 1 . 

Table 1. The QSAR working table for Equation (1) in the presence of M-structural 
descriptors for TV-compounds with known activities. 



Observed Activity Structural Predictor Variables 



A 


Xj 


x k .. 


Xm 


A, 




Xik 


X\m 


A 2 


X 2 \ 


Xlk 


XlM 


A N 


Xn\ 


xm 


Xnm 
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However, in order for the chemical structure be correlated with bio-, eco-, or pharmacological 
activity in an analytical manner (from where the name Quantitative Structure- Activity Relationship 
arises) that has sense for the ligand-receptor interaction under study, the Organization for Economic 
Cooperation and Development (OECD) developed the so-called QSAR-OECD principles, which have 
already been adopted by the EU Parliament as the official guidelines for further regulation of 
compounds in the European Union. They are, in short [32]: 

• QSAR 1 : a defined endpoint 

• QSAR-2: an unambiguous algorithm 

• QSAR-3: a defined domain of applicability 

• QSAR-4: appropriate measures of goodness-of-fit, robustness and predictivity 

• QSAR-5: a mechanistic interpretation, if possible 

Put differently, they express the essence of the chemical modeling of biological effects while 
relaying (Husserl-Russell) knowledge phenomenology in a more general manner [33]: 

• QSAR-1 . why does one do modeling ? 

• QSAR-2. how does one do modeling ? 

• QSAR-3 . with what tools do I model ? 

• QSAR-4. how reliable is what I modeled ? 

• QSAR-5. what knowledge did the model provide ? 

Therefore, although the backbone of QSAR modeling is based on equation (1), one should be aware 
that it represents, despite the innumerable extant studies, only one type of model — the 
multi-linear type. It is therefore worth refreshing QSAR studies by exploring other ways of combining 
the structural parameters that cause the observed biological activity. However, although it is clear that 
non-linear QSAR is the next generation of correlations, one should not search arbitrarily or randomly 
while having at hand a well-designed theory of non- linear modeling of natural phenomena: Thorn's 
catastrophe theory, the basic assumptions and main working tools of which are presented next. 

2.2. Thorn 's Catastrophe Theory 

Rene Thorn's catastrophe theory basically describes how, for a given system, continuous action on 
the control space ((?), parameterized by Q's, provides a sudden change in its behavior space (f), 
described by x m variables through stable singularities of the smooth map [34,35] 



with rj(ck, x m ) called the generic potential of the system. Therefore, catastrophes are given by the set of 
critical points (ct, x m ) for which the field gradient of the generic potential vanishes 



the number of variables in space T" (also called the co-rank, m), Thorn classified the generic potentials 
(or maps) given by Equation (2) as seven unfolding elementary (in the sense of universal) 



Tj(c k ,x m ):C k xI m ^X 



(2) 




(3) 



or, more rigorously: a catastrophe is a singularity of the map M kxm — > C\ 

Next, depending on the number of parameters in space C k (also called the co-dimension, k) and on 
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catastrophes, i.e., providing the multi-variable (with the co-rank up to two) and multi-parametrical 
(with the co-dimension up to four) polynomials listed in Table 2. Going to the higher derivatives of the 
generic potential (the fields), the control parameter c/t* for which the Laplacian of the generic 
potential vanishes 

A x ri(c k *,x m ) = 0 (4) 

gives the bifurcation point. Consequently, the set of control parameters c* for which the Laplacian of a 
critical point is non-zero defines the domain of stability of the critical point. It is clear now that small 
perturbations of n(c*, x) bring the system from one domain of stability to another; otherwise, the 
system is located within a domain of structural stability. 



Table 2. Thorn's Classification of Elementary Catastrophes [36,37]. 



Name 



Co- Co- 
dimension rank 



Universal unfolding 



Parametric 
Representation 



Fold 



Cusp 



Swallow tail 



x 3 + ux 



x 4 +ux 2 +VX 



x 5 +ux 3 +vx 2 +WX 




Butterfly 



6 , 4 , 3 , 2 , , 
X +UX +VX +WX +tx 
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Table 2. Cont. 




Remarkably, the cases described above correspond to the equilibrium limit of the dynamical 
(non-equilibrium) evolution of an open system 



I St j 



= 0 



(5) 



where the behavior space is further parameterized by the temporal paths x m (ck, t). The connection with 
equilibrium is recovered through the stationary time regime imposed on the critical points. In this way, 
the set of points giving a critical point in the stationary t — > +oo regime (the so-called m-limif) 
corresponds to an attractor, and it forms a basin, whereas the stationary regime t — > -oo (the so-called 
a-limit) describes a repellor. In this way, the catastrophe polynomials may be regarded either as an 
asymptotic solution of a dynamical evolutionary system or as a steady state solution allowing the 
quasi-equilibrium of the ligand-receptor or inhibitor-organism interactions to be described. However, 
in complex binding systems with multiple evolutionary phases, e.g., the HIV-1 life cycle, the 
possibility of "linking" the various classes of catastrophes themselves may provide a striking analytical 
approach to the dynamics and mutational sensitivity of the studied interaction that starts with the 
actual catastrophe-QSAR method. 
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3. Catastrophe-QSAR Method 



Aiming to construct a QSAR rationale from the elementary catastrophes, the next steps 
are implemented: 

i. Assuming the vectorial form of activities and of associated QSARs are according to Table 2, 
Table 3 showing catastrophe-QSAR is thereby formed. 

Table 3. Algebraic realization of Thorn's elementary catastrophes as uni- and bi- nonlinear 
QSARs. The systematics of the sub-indices indicate consecutive coupled pairs, where each 
pair is interpreted as: the index of a structural factor followed by its power. 



Model 



QSAR Equation 



GROUP I: with one descriptor only, X l 



QSAR-(I) 

Fold 

Cusp 

Swallow tail 
Butterfly 



\Y I ) = a 0 \l) + a n \X l ) 



|M = /o|l) + /ll|*l) + /l3 

\Y c ) = c 0 \l) + c n \X l ) + c 12 



X; 



X ) + c 



14 



X 



\Ys T ) = ^\l) + s n \X l ) + s n X 2 ) + s l3 Xf) + s 



bM+b^x^+b 



'\2 



>15 



X 2 ) + b l3 xl) + b u 



X; 



X ) + b 



16 



X 



GROUP II: with two descriptors, X x ) , X 2 



QSAR- (II) 
Hyperbolic 
umbilic 

Elliptic umbilic 

Parabolic 
umbilic 



\ Y Il) = ^o\ l } + ^n\ X l) + ^2l\ X 2 

|7 ff[/ ) = /. 0 |l) + / ?11 |X 1 ) + / ?21 |X 2 ) + /. 1121 |X 1 X 2 ) + / ?13 |X 1 3 ) + / ?23 |X 2 3 ) 
\ y eu) = e o\ l ) + e n\ x i) + ^2i\ x 2) + ^i2 x \) + e22 x l) + e U22 X x X 2 ) + e 



13 



X; 



|^C/) = J Po| 1 ) + Al|^l) + J P2l|^2> + J Pl2 X l) + P22 X l) + Pl22l X l X 2 ) + P 2 4 



X 



ii. Determine the norms for each model 



r)\\=W)=Jty? 



(6) 



!=1 



iii. Calculate the algebraic correlation factor for each model [31] 



R 



\Y 



ALG 



\4 



N 



I>< 2 



i=i 



N 



(7) 



14 



i=\ 



iv. Calculate the so-called "statistical relative power''' index for each model with each set 
of descriptors 

(8) 



U = ^r 2 +t 2 +f 2 
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where the components are defined as follows: 
• relative index of correlation: 



R 



r = 



ALG 



R 



Pearson 



(9) 



relative index for Student's *-test 



t 



t = 



Computed 



t 



Tabulated 
(l-cr=0.99; 
N-M-2) 



(10) 



• relative index for Fisher's test 



/ = ■ 



Computed 

Tabulated 

(l-a=0.99; 

M,N-M-\) 



(11) 



v. Determine the generalized Euclidian distances between corresponding type-I and type-II models 
employing different descriptors 



m = J(r-r'Y + (t-tf + (f-fY 



and establish formal matrices for the models' differences for single descriptors, respectively 



A 2 fl 



7(X 1; X 2 ) 



An 7(X]) -An /(X2) 



(12) 



(13) 



where 



An 



I(X=X t vX 2 ) 



QSAR I(X) -F (X) QSAR I(X) -C (X) QSAR I(X) - ST (X) QSAR } 



F (X) ~ C (X) 



(X) ^(X) 
F (X) - ST (X) 



- R ^\ 
l(X) D {X) 

F (X) - B (X) 

C (X) - B (X) 

ST (X) - B (X) J 



(14) 



and for pair descriptors 



An 



Il(X, aX 2 



QSAR II{X Xj ) - HU (Xi ^ ) QSAR n {x ^ ^ } - EU (X ^ X2 ) QSAR n ^ ) - PU (Xi ^ ) 



HU {X U X 2 ) EU {X„X 2 ) 



HU {X,,X 2 ) PU (X„X 2 ) 
EU {X U X 2 ) - pu (x u x 2 ) ) 



(15) 



2 

vi. Identify all minimum paths across all differences AYl I(X vX ^ , A n /(Jf X ) and AYl u t x ^ x \ for a 
given set of descriptors (X l ,X 2 ) 



(16) 



s{au hx) )=o 

s{a 2 u HX]VX2) }=o 

^{An //(XiAX2) }=o 
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The combination of descriptors that fulfills this system provides the molecular mechanism of the 
interaction. The correlation models involved are ordered according to their relative statistical power 
within the same molecular mechanism, thereby providing the best models. Because pair-descriptors are 
primarily involved in the present analysis, one can consider the first two such "waves" and their best 
correlation models up to the second order minimum paths, as in Equation (16). 

vii. For selected correlation models, in either structure-driven or molecular mechanistic "waves," one 
employs them to compute the associated predicted activities for test molecules and to provide the 
statistics regarding the observed activity. If the obtained relative statistical power is close to those 
characteristic for the trial set of molecules, then these models may be validated for the specific 
eco-, bio-, or pharmacological problem. Moreover, further insight will be provided by the analysis 
of the catastrophe shape of the models involved and discussed accordingly. 

Nevertheless, more Catastrophe Theory insights and the natural consequence on statistical (Pearson) 
correlation behavior may be found in Appendix. 

4. Application to Non-Nucleoside Reverse Transcriptase Pyridinone Inhibitors 



4.1. Input Data 

As a working molecular series, the interesting series of pyridinone derivatives in Table 4 is herein 
employed [24] because of their potential for improving and complementing the currently available four 
NNRTIs that have been approved by the U.S. FDA for HIV/ AIDS treatment (Nevirapine-Viramune®, 



Delavirdine-Rescriptor®, Efavirenz-Sustiva 18 ', Etravirine-Intelence 1 *), all of which bind to the 
hydrophobic pocket of HIV- 1 reverse transcriptase [38]. The pyridinone derivatives were divided into 
a training set of 23 compounds and a test set of 9 compounds according to the methods of 
normal/Gaussian (G) and non-normal/non-Gaussian (NG) fitted activity [39-41] (Figure 1). 

Figure 1. Gaussian (G) and non-Gaussian (NG) screening of the observed activities of the 
working molecules in Table 4 grouped into trial and test congener series. 



'B 1 



Trial and Test Molecules 




Trial Set 
Test Set 
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Table 4. Actual working reverse transcriptase pyridinone inhibitors grouped in Gaussian (G) and non-Gaussian (NG) molecular congeneric 
sets with their structural information (hydrophobicity, Log P; molecular polarizability POL [A 3 ] and total optimized energy of formation 
H [kcal/mol]) computed upon the semi-empirical PM3 method [42], along with their observed activity A = Log (1/IC50) [24]. 



No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


Structure 


Name 


Log (l/ICso) 


LogP 


POL (A 3 ) 


H (kcal/mol) 






H 

H I 

\K ° ^ 

H H 


3 - { [(6 ' -azabenzofuran-2 ' - 










1. 


Gl 


yl) methyl] amino } -5 -ethyl- 
6-methylpyridin-2( 1 H)-one 


3.98 


-0.54 


31.21 


-14.67 






H 

H I 
H H 


3- {[(5' -azabenzofuran-2 '- 










2. 


G2 


yl) methyl] amino } -5 -ethyl- 
6-methylpyridin-2( 1 H)-one 


4.49 


-0.54 


31.21 


-16.195 








3 - { [(pyridine -2 ' -yl) 










3. 


G3 


N — 1 H 


methyl] amino } -5 -ethyl-6- 


4.82 


0.21 


27.87 


-5.854 






methylpyridin-2( 1 H)-one 










4. 


G4 


^^^^ 


3-benzylamino-5-ethyl-6- 
methylpyridin-2( 1 H)-one 


5.27 


0.67 


28.58 


-11.659 
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No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


Structure 


Name 


Log (1/IC 50 ) 


LogP 


POL (A 3 ) 


H (kcal/mol) 


5. 


G5 


N J 

NL /^N 
0 


() 


3-{[(l',3'-naftoxazol-2'-yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


5.57 


1.20 


38.48 


-1.878 


6. 


G6 


H 

X N 0 


0 


3 - { [( 1 ' -benzopyran-4 ' -one- 
3'-yl) methyl]amino}-5- 
ethyl-6-methylpyridin- 
2(lH)-one 


5.96 


-0.71 


33.84 


-61.455 


7. 


G7 


H ' \\ V|h T u 

o II) 


3 - { [(benzopyridine-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


6.28 


1.16 


35.14 


11.246 


8. 


G8 


N 

N~J H 

H \\ 
0 




3- { [( 1 ' ,3 ' -benzothiazole- 
2'-yl) methyl]amino}-5- 
ethyl-6-methylpyridin- 
2(lH)-one 


6.46 


0.54 


33.57 


17.808 
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No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


SH"i*iipfiii*p 

kJll UtlUI V 


1 1 <1111V_ 


Log (1/IC 50 ) 


T op P 


POL (A 3 } 


H fkcal/mnN 

11 1 IVVtll/ 111U1 1 






H 


3-{[(4'-methyl- 










9. 


G9 


benzoxazole-2'-yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


6.92 


0.67 


33.05 


-27.613 






\ H 

Vn C \ 


3-{[(4',7'-dichloro- 










10. 


G10 


n o 


benzofuran-2 ' -yl) 
meinyijaminoj-j-einyi-o- 
methylpyridin-2( l H)-one 


7.24 


0.88 


35.78 


-33.749 








3-{[(4',7'-dimethyl- 










11. 


Gil 




benzoxazol-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


7.7 


1.13 


34.88 


-38.048 






CI 


3-{[(4',7'-dichloro- 










12. 


G12 


CI 


benzoxazol-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


7.72 


1.24 


35.07 


-30.071 
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No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


Structure 


Name 


Log (l/IC 5 o) 


LogP 


POL (A 3 ) 


H (kcal/mol) 


13. 


G13 




N 


1 

\ N 
^0 


o >/ 


3-[(4',7'-dimethyl- 
benzoxazol-2'-yl) ethyl] -5- 
ethyl-6-methylpyridin- 
2(lH)-one 


7.55 


2.62 


35.37 


-47.701 


14. 


G14 






\ [ 
0 


0 


3-[(4',5',6',7'-tetrahydro- 
benzoxazole-2'-yl) ethyl]- 
5 -ethyl-6-methylpyridin- 
2(lH)-one 


7.24 


-0.02 


32.08 


-63.299 


15. 


G15 




0— CH 3 

H /O / 


3-{[(4'-methoxy- 
benzoxazole-2'-yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


6.74 


-0.05 


33.68 


-54.452 


16. 


G16 






H 

% ' 


0 


3-[(4',5',6',7'-tetrahydro- 

benzoxazole-2'-yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


6.55 


-1.50 


31.59 


-50.643 


17. 


G17 


[ 


N — 

l 

H 


\ 

L H 
0 




3 - { [(benzothiophene-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


6.30 


0.19 


34.28 


11.703 
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No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


Structure 


Name 


Log (l/IC 5 o) 


LogP 


POL (A 3 ) 


H (kcal/mol) 


18. 


G18 


H ,0 ^ 


3-{[(5'- 
methylbenzoxazole-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


5.90 


0.67 


33.05 


-27.741 


19. 


G19 


^^^^ 


3 -[(benzopyridine-2 ' -yl) 

ethyl]5-ethyl-6- 
methylpyridin-2( 1 H)-one 


5.61 


2.71 


35.62 


3.331 


20. 


G20 




3-{[(indol-2'-yl) methyl] 

amino}-5-ethyl-6- 
methylpyridin-2( 1 H)-one 


5.36 


-0.34 


32.63 


4.727 


21. 


G21 


H ' N N ^^^^^ N 


3 - { [(quinazolin-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


5.12 


0.02 


31.92 


8.171 


22. 


G22 


O 


3-{[(indol-3'-yl)methyl] 

amino}-5-ethyl-6- 
methylpyridin-2( 1 H)-one 


4.65 


-0.43 


32.63 


2.957 
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No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


Structure 


Name 


Log (1/IC 50 ) 


LogP 


POL (A 3 ) 


H (kcal/mol) 


23. 


G23 




) 

N 

/ 
H 


/ 

\ v 


3-(P-phenilethyl)-5-ethyl- 
6-methylpyridin-2( 1 H)-one 


4.30 


2.36 


29.06 


-23.245 


24. 


NG1 




H 

N — 

£/ 


//° o 

f H " ^ 

N (' 7 


3 - { [(4 ' -quinozolone-2 ' -yl) 
iiic my i j aiiiiiiu ) - j -c my i-u- 
methylpyridin-2( 1 H)-one 


5.60 


— fl 47 


JJ.OJ 


JU.7J7 


25. 


NG2 




H 

1 

N — 


O 

T H 

J — N 0 

N ( x 7 


3-{[(3',4'- 
diazobenzofuran-2 ' -yl) 
methyl] amino } -5 -ethyl-6- 
methylpyridin-2( 1 H)-one 


5.72 


U.Uj 




-o. IzU 


26. 


NG3 




H 

/)- 
_// 


H 

-N 0 O-H 

N (' 7 


3-{[(7'-hydroxy- 
benzoxazole-2'-yl) 
Tnethvllatnitin \ -5-etliv1-6- 
methylpyridin-2( 1 H)-one 


6.36 


-0.08 


31.85 


-62.189 


27. 


NG4 




H 

N — ' 


O 

-"-"l 0 ci 
cr 


3-[(4',7'-dichloro- 
benzoxazole-2'-yl) ethyl] - 
5 -ethyl-6-methylpyridin- 
2(lH)-one 


7.85 


2.72 


35.55 


-39.459 
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Table 4. Cont. 



No. 


Type 


WORKING MOLECULES 


A obs 


QSAR parameters 


Structure 


Name 


Log (1/IC 50 ) 


LogP 


POL (A 3 ) 


H (kcal/mol) 






H /° 
N — y H 


3-{[(7'-ethyl- 










28. 


NG5 


benzoxazole-2 ' -yl) 
methyl] amino} -5-ethyl-6- 


6.59 


1.06 


34.88 


-34.478 






methylpyridin-2( 1 H)-one 














\ H 


3 - [(5 ' -phenyl-oxazole-2 ' - 










29. 


NG6 


0 iQ 


yl) ethyl]-5-ethyl-6- 
methylpyridin-2( 1 H)-one 


6.41 


0.96 


35.17 


-21.361 






\ / H 


3 -[(benzothiazole-2 ' -yl) 










30. 


NG7 




ethyl]-5-ethyl-6- 
methylpyridin-2( l H)-one 


6.43 


2.02 


34.06 


8.873 








3-{[(2'naphtyl) methyl] 










31. 


NG8 


7 i 

H 


amino}-5-ethyl-6- 
methylpyridin-2( l H)-one 


6.34 


1.67 


35.85 


5.495 






^ H N 0^\___^\ 


3 - { [(5 ' -phenyl-oxazole- 










32. 


NG9 


T" x n= :0 1 — ^ — ' 


2'-yl) methyl] amino} -5- 
ethyl-6-methylpyridin- 
2(lH)-one 


5.63 


-0.53 


34.69 


-10.850 
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4.2. Results and Discussion 

The catastrophe-QSAR algorithm of Section 3 was applied to the molecules of Table 4, and the trial results are presented in Tables 5-9. 



Table 5. Correlation equations for the Group-I models of Table 3 and the molecular structures and data of Table 4. 



Catastrophe 


QSAR Model 


-"Pearson 


"ALG 


r (0 


t-Stud. 




Fisher 




n « 


QSAR 
(I) 




Y 1 LogP ) = 5.861 


1) + 0.240 ZogP) 


0.228 


0.984 


4.317 


22.344 


7.854 


1.150 


0.143 


8.963 




7/ 0i ) = -2.25' 


7 1)+ 0.249 \POL) 


0.554 


0.989 


1.784 


-0.832 


-0.292 


9.284 


1.158 


2.147 




7/} = 5.57 


1)- 0.021 




0.476 


0.987 


2.074 


20.597 


7.24 


6.156 


0.768 


7.57 


Fold 

(F) 




Y F LogP ) = 5.854 


1) + 0.738 ZogP)- 0.106 


LogP 3 ) 


0.382 


0.986 


2.581 


22.936 


8.062 


1.705 


0.213 


8.468 




7/ 0i ) = -24.2( 


)6 l) + 1.26| J POZ)-3 10 


x POL') 


0.601 


0.989 


1.646 


-1.422 


-0.45 


5.650 


0.704 


1.859 




7/) = 5.58 


l)-0.016|//)-2-10 6 H 3 ) 


0.481 


0.987 


2.053 


20.095 


7.063 


3.01 


0.375 


7.365 


Cusp 
(C) 




Y c LogP ) = 5.707 


1) + 0.426 \LogP) + 0.372 


LogP 2 )-0.01\LogP 4 ) 


0.348 


0.985 


2.832 


16.120 


5.666 


0.872 


0.109 


6.335 




7 c POi } = 431.26 


l)-35.694|POZ) + 0.83: 


)POL 2 )-\0 4 POL 4 ) 


0.713 


0.992 


1.391 


2.240 


0.787 


6.558 


0.818 


1.796 




7/} = 5.00( 


3 1) + 0.042 H) + 0.003 H 2 ) - 1 0 -6 // 4 } 


0.764 


0.993 


1.300 


19.802 


6.960 


8.864 


1.105 


7.166 


Swallow 
tail 
(ST) 




Y s L T ogP ) = 5.649 
-0.978 LogP') 


1) + 1.608 ZogP) + 0.326 
> + 0.093 ZogP 5 ) 


LogP 2 ) 


0.575 


0.989 


1.720 


18.665 


6.561 


2.222 


0.277 


6.788 




Y S P T 0L ) = 1476.244 l) - 
+ 5.79 \POL 2 )- 0.079 


156.079 POZ) 
POZ 3 ) + 5.5-10 5 


POL 5 ) 


0.715 


0.992 


1.387 


0.45 


0.158 


4.708 


0.587 


1.515 




Y S H T ) = 4.8841) + 0.031// 
+ 5.2-10' 5 # 3 ) + 4-10~ 10 


) + 0.004 7/ 2 } 


0.763 


0.993 


1.302 


15.608 


5.486 


6.263 


0.781 


5.692 
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Table 5. Cont. 



Catastrophe 


QSAR Model 


R (a) 
^Pearson 


R < b > 


r (c) 


t-Stud 


r (d) 




At) 
J 




Butterfly 
(B) 




Y B LogP ) = 5.646 1) + 1.464 \LogP) + 0.303 
- 0.688 LogP 3 ) - 0.04 1 LogP 4 ) + 0.027 


LogP 2 ) 
LogP 6 ) 


0.578 


0.989 


1.711 


15.169 


5.332 


1.704 


0.212 


5.604 




7/ 0i ) = -16485.8271) 
+ 4.037 POL 3 )- 0.047 


+ 2491.049 \POL 
POL 4 ) + 2.9-10 


)- 146.094 POL 2 ) 
5 POL 6 ) 


0.718 


0.992 


1.382 


-0.355 


-0.125 


3.619 


0.451 


1.459 




7/ } = 4.876| 1) + 0. 1 1 0| H) + 0.004 H 2 ] 
-2.3-10 4 H 3 )-1.61-W 6 H 4 ) + 63- 


> 

10 10 H 6 ) 


0.856 


0.996 


1.163 


19.088 


6.709 


9.349 


1.166 


6.908 



the statistical Pearson correlation factor; <b) computed from Equation (7); (c) computed from Equation (9); <d) computed from Equation (10) with t Tahu i ated = 2.845 ; 

(0.99;20) 

computed from Equation (1 1) with F Tahulated = 8.02 ; (f) computed from Equation (8). 

(0.99;1,21) 



Table 6. Correlation equations for the Group-II models of Table 3 and the molecular structures and data of Table 4. 



Catastrophe 


QSAR Model 


R (a) 
^Pearson 


K A LG 


r (0 


t-Stud. 


,(d> 


Fisher 




n © 


QSAR 
(II) 


Yf p - POL ) = -2.044 


\) + 0. 05 \\LogP)+ 0.242 


POL) 


0.556 


0.989 


1.778 


-0.702 


-0.245 


4.464 


0.763 


1.9504 




Y^ ogPM )= 5.379 


\) + 0304\LogP)- 0.023 


**) 


0.556 


0.989 


1.778 


18.564 


6.489 


4.468 


0.764 


6.771 




Y P0LH ) = -2.637 


1) + 0.248 \POL)- 0.021 


H) 


0.728 


0.992 


1.363 


-1.151 


-0.402 


11.302 


1.932 


2.398 
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Table 6. Cont. 



Catastrophe 



QSAR Model 



R < a > 
JM'earson 



R C> 



t-Stud. 



Fisher 



yLogP.POL 
1 HU 



-39.499|l) - 2.463| LogP) + 2.043\POL 



+ 0A04\(LogPXPOL)) - 0.\45\LogP 3 ) - 6-\0 



POL 3 



0.715 



0.992 



1.387 



-2.215 



-0.774 



3.561 



0.609 



Hyperbolic 
umbilic 
(HU) 



Y, 



LogP.H 
HU 



5.3 1 9| 1) + 1 .0831 LogP) - 0.0021 H 



0.003\(LogPXH))-0.\6\\LogP 3 )-9-\0- 6 



H 



0.736 



0.992 



1.3485 



19.328 



6.756 



4.019 



0.687 



Y™ LH ) = - 1 3 . 1 92| 1) + 0.766| POL) + 0. 1 22| H) 
-0.004\(POL\h))-2-\0- 



POL 3 -5.1-10 



0.755 



0.993 



1.315 



-0.79 



-0.276 



4.503 



0.770 



Yeu P ' POL ) a = -69.262|1) - 0.556\LogP) + 4.53\\POL 
+ 0.443| LogP 2 ) - 0.068| POL 2 ) 
+ 0.0021 (LogP^POL 2 )) - 0.322 LogP 



0.757 



0.993 



1.312 



-2.548 



-0.891 



3.582 



0.612 



Elliptic 
umbilic 
(EU) 



YLogP.POL 
' EU 



644.6231 1) + 0.0221 LogP) - 59.9341 POL 



POL 3 



+ 0.467\LogP 2 ) + 1.855 \POL 



0.722 



0.992 



1.374 



1.866 



0.652 



2.908 



0.497 



- 0.0 1 5| (POLiLogP 2 )} - 0.0 1 9 



Y L E f' H ) = 5 .0221 1) + 0.9741 LogP) + 0.0251// 



+ 0.530 



LogP 2 ) + 0.001 \H 2 



0.843 



0.995 



1.181 



20.638 



7.214 



6.542 



1.118 



2.87 • 10~ 4 1 (ZogP)(// 2 )) - 0.3591/ogP 



Int. J. Mol. Sci. 2011, 12 



9552 



Table 6. Cont. 



Catastrophe 


QSAR Model 


R < a > 


R <b) 




t-Stud. 


,(d) 


Fisher 






Elliptic 
umbilic 
(EU) 




-0.21 
+ 0.00 


.7791) + 0.643 \LogP) 
[LogP 2 ) + 0.004\H 2 ) 
\(H"(LogP 2 )) + 5-\0- 


+ 0.029//) 

5 // 3 ) 


0.851 


0.995 


1.170 


17.047 


5.958 


7.015 


1.199 


6.189 




Y™ uh ) a =807.8221} -74.6 
+ 2.291 \POL 2 ) + 0.005 
-2-10- 4 (POzX// 2 ))- 


31 \PO, 

H>) 

0.023 


L)-0.02//) 
POL') 


0.857 


0.996 


1.162 


3.124 


1.092 


7.346 


1.256 


2.029 




+ 0.011 
-4-10 


L.8881)- 0.562 \POL) 
[\POL 2 ) + 0.004 \H 2 ) 
5 \(h){pOL 2 )) + 4-\0- 


+ 0.068//) 

5 |// 3 ) 


0.853 


0.996 


1.167 


0.532 


0.186 


7.120 


1.217 


1.696 


Parabolic 
umbilic 
(PU) 


Ypu P ' pol ) a = 474.915 1) + 0.021 \LogP) 
+ 0.454 \LogP 2 } + 0.9 14 \POL 2 ) 
-0.015 \(LogP 2 \POL))-\0 " 


- 39.256 POL) 
POL 4 ) 


0.722 


0.992 


1.374 


1.817 


0.635 


2.905 


0.497 


1.593 


Yp°f' P ° L ) B = -67.522 1) - 1 .539 \LogP] 
+ 0.573 \LogP 2 ) - 0.067 \POL 2 ) 
+ 0.002 (POL 2 \LogP)) -0.115 


> + 4.444 POL) 
^ogP 4 ) 


0.703 


0.992 


1.411 


-2.219 


-0.776 


2.611 


0.446 


1.671 
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Table 6. Cont. 



Catastrophe 


QSAR Model 


R < a > 


R <b) 




t-Stud. 


,(d) 


Fisher 






Parabolic 
umbilic 
(PU) 




-0.24C 
+ 0.002 


552 1) + 0.700 ZogP) + 0.04 1 H) 
LogP 2 ) + 0.004\H 2 ) 
\LogP 2 \H))-\0- 6 // 4 ) 


0.874 


0.996 


1.140 


20.243 


7.075 


8.645 


1.478 


7.317 




Yp° u gP - H ) B =5.101) + 0.552 \Lo 
+ 0.460 LogP 2 ) + 9.57 -10" 4 
+ 1.93-10 4 \(H 2 \LogP))-( 


gP) + 0.020//) 
).099 ZogP 4 ) 


0.767 


0.993 


1.295 


16.828 


5.882 


3.815 


0.652 


6.058 




Y™ uh ) a = 8.8761) -0.366 \POL) + 0.069//) 
+ 0.008 \POL 2 ) + 0.003 \H 2 ) 
-3.7 -10 5 (PO/ 2 X//))-4.5-10 7 // 4 ) 


0.841 


0.995 


1.183 


0.386 


0.135 


6.447 


1.102 


1.623 


Y™ l - h ) b = 595.212 1) - 48.906 \POL) - 0.019 \H) 
+ 1.129 POL 2 ) + 5-10 3 \H 2 ) 
- 1 .49 • 1 0 4 (H 2 \POL)) -1.73 - 10 4 POL 4 ) 


0.856 


0.996 


1.163 


3.074 


1.074 


7.292 


1.246 


2.015 



the statistical Pearson correlation factor; ( ' computed from Equation (7); <c) computed from Equation (9); ( ' computed from Equation (10) with t Tabulated = 2.861 > 

(0.99;19) 

computed from Equation (1 1) with F T h 1 1 d = 5.85 ! <f * computed from Equation (8). 

(0.99;2,20) 
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Table 7. Single-structure matrices of the Euclidean distances An 7 of the QSAR and catastrophe models' relative statistics of Table 5 
employing Equation (12). 



LogP 


F 


C 


ST 


B 


QSAR 


1.750 


2.645 


2.905 


3.627 


F 




2.411 


1.732 


2.865 


C 






1.437 


1.174 


ST 








1.231 



POL 


F 


C 


ST 


B 




H 


F 


C 


ST 


B 


QSAR 


0.517 


1.198 


0.828 


0.830 




QSAR 


0.431 


0.89 


1.916 


1.127 


F 




1.317 


0.717 


0.524 




F 




1.054 


1.793 


1.242 


C 






0.670 


0.983 




C 






1.509 


0.292 


ST 








0.314 




ST 








1.29 



Table 8. Differences A II 7 between the single-structure matrices of the Euclidean distances in Table 7. 



|Log P ■*■ POL| 


F 


C 


ST 


B 


QSAR 


1.233 


1.446 


2.076 


2.797 


F 




1.094 


1.015 


2.341 


C 






0.767 


0.191 


ST 








0.917 



|LogP-H| 


F 


C 


ST 


B 




|POL ■*■ H| 


F 


C 


ST 


B 


QSAR 


1.32 


1.755 


0.988 


2.501 




QSAR 


0.086 


0.309 


1.088 


0.297 


F 




1.358 


0.062 


1.624 




F 




0.264 


1.076 


0.717 


C 






0.072 


0.882 




C 






0.839 


0.691 


ST 








0.059 




ST 








0.976 



Table 9. Single-structure matrices of the Euclidean distances An /7 of the QSAR and catastrophe models' relative statistics of Table 6 
employing Equation (12); note that for the degenerate models of Table 6 that one is employed that displays higher relative statistical 
power (II). 



Log P A POL 


HU 


EU 


PU 


QSAR 


0.675 


0.810 


1.005 


HU 




0.139 


1.414 


EU 






1.531 



Log P A H 


HU 


EU 


PU 


QSAR 


0.512 


0.917 


1.123 


HU 




0.964 


0.878 


EU 






1.152 



POL A H 


HU 


EU 


PU 


QSAR 


1.170 


1.652 


1.640 


HU 




1.46 


1.440 


EU 






0.02 



Int. J. Mol. Sci. 2011, 12 



9555 



For the trial set of molecules from Figure 1 and Table 4, the results in Tables 5 and 6 can be 
interpreted as follows: 

First, it is clear that consideration of the catastrophe (polynomial) correlations is an 
improvement over the old multi-linear QSAR statistics (see also Appendix- A2). 
The hydrophobic ity indicator gives generally low correlations with any polynomial (linear, 
multilinear or catastrophe) approach, being a quite irrelevant linear QSAR descriptor (Table 5) 
but improving up to twice its influence within the swallow tail and butterfly phenomenologies 
once its fifth and sixth power involvement are considered. Nevertheless, this provides a sign of 
the value of catastrophe-QSAR for achieving a deeper understanding of the molecular 
mechanics of specific interactions when the normal multi-linear QSAR does not assign 
transport descriptors with much predictive power. 

The relative statistical power, as defined by Equation (8), does not always parallel the Pearson 
coefficient or the relative correlation factors, as is evident from Tables 5 and 6. However, 
because it includes more statistical information, we consider a model as relevant when it has 
greater individual output of this newly introduced statistical index. In particular, neither the 
linear nor the multilinear QSAR framework provides a good fit between the statistical 
correlation and the relative statistical power using the structural parameter combinations 
considered. Instead, parabolic catastrophe correlations, the cusp and butterfly models, are 
revealed to be quite relevant, in particular regarding the formation energy (H) for which they 
show the highest Pearson correlation and relative statistical power values in comparison with 
the other descriptors plugged into these models. Unfortunately, for the two-variable descriptor 
models of Table 6, no consistency was found between the highest Pearson value and the 
relative statistical power apart from a few degenerate cases of descriptors for the parabolic 
models where the highest relative statistical power value corresponds with the highest Pearson 
correlation. Note that for the degenerate cases of Table 6, when two mixed descriptors can be 
combined in two distinct ways, the working model is considered to have maximum relative 
statistical power. 

However, because the two-fold aim of the present research is to find the best predictive model and 
the molecular mechanism of action for the given set of molecules, the statistical indices of Tables 5 
and 6 are employed to compute the first- and second-order differences (or distances) in relative 
statistical power as described by Equations (12-15) of Section 3. They correspond to the 
inter-descriptor/inter-modeling paths of molecular actions, whose minimum values are identified 
according to the prescription of Equation (16). 

Through this minimal relative statistical power path recipe, once the models and descriptors 
predicted to be on the forefront of the structure-action interaction are selected, they are then further 
filtered with the testing set to finally identify the best predictive model and reveal the mechanism of 
action by means of the structural descriptors considered. 

In the present case of the HIV inhibitors in Table 4, the data computed from Tables 5 and 6 provide 
the results for Tables 7-9, to be discussed herein: 

Table 7: At the individual descriptor level, the cusp and butterfly models are very close to each 
other for Log P and the forming energy H, which is even more relevant for the hydrophobicity, 
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because for the forming energy it transpires from Table 5 that the butterfly model practically 
reduces to the cusp model because the sixth contribution virtually vanishes. However, for the 
structural influence on polarizability (POL) the butterfly and swallow tail are the closest 
models. When one considers the hierarchy of the individual descriptors according to their 
QSAR-I models in Table 5 in terms of the reduction in relative statistical power 

LogP^H^POL (17a) 

through combining it with the catastrophes involved in Table 7, one correspondingly obtains the 
evolution cycle of the models: 

(... -^Butterfly] — > [Cusp] — > [Butterfly] — > [SwallowTail](-> ...) (17b) 

Table 8: When the second order distance difference is considered between the individual inter- 
modeling paths of Table 7, it can nevertheless be considered through the further variations of 
paths of Table 7. Also, the QSAR-I and the fold (F) catastrophe model intervene in changing 
the influence on specific interactions from POL to H. Therefore, by counting the minimum 
hierarchy of these paths, the distance ordering is obtained as follows: 

(LogP + H)^(H + POL)^ (POL + LogP) ( 1 8a ) 

which, remarkably, confirms the descriptors' cycles of influence in accordance with the first order 
prescription of Equation (17a). However, a more detailed succession is recorded for the 
inter-model evolution: 

(... -^[Butterfly] = [SwallowTail] -> [QSAR -I] = [Fold] -> [Cusp] s [Butterfly](... ->) (18b) 

When comparing cycles (18b) with (17b), it seems that the QSAR-I and Fold models appear in (18b) at 
the second cycle after the first one is performed on the prescription of (17b). For this reason also, the 
direct second order inter-descriptor-inter-models analysis is undertaken, and the results are reported in 
Table 9, to be discussed hereafter. 

Table 9: Interestingly, in terms of the two structural descriptors, the QSAR model is present 
even though its individual statistics are not the highest in Table 6; however, judging by the 
ordering of minimum paths recorded, the coupling descriptors hierarchy is established as: 

( H & POL ) -> ( POL &Log P)^(Log P&H) ( \ 9a ) 

which is associated with the models' evolution 

(... ->\PU] -> [EU] -> [HU] -> \QSARf~* [PU]... ->) (19b) 

One should make "contact" between the descriptor hierarchies [(17a), (18a), (19a)] and the models' 
cycles [(17b), (18b) and (19b)] by means of the predictivity powers of the models along the minimum 
paths identified in Tables 7 and 9 with the single and double descriptors, respectively, for the 
non-Gaussian (NG) molecules of Table 4 and Figure 1 . The results are systematically presented in 
Tables 10 and 11. 
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Table 10. Predicted activity as computed for the non-Gaussian molecules of Table 4 with 
the models of Table 5 founded along the minimum paths of Table 7; for each predicted 



model, its correlation with the observed activity is indicated at the bottom of the table. 



^\ \ ,T „ 1 1 

Molecule^\ 


Y C L08P ) 


Y C H ) 


yPOL\ 
1 ST J 


Y B L ° sP ) 


Y B P0L ) 


Y B H ) 


NG1 


5.586 


6.179 


5.294 


5.094 


-20.595 


5.687 


NG2 


c Tin 

5.729 


4.885 


4.294 


5.719 


-9.764 


4.360 


NG3 


5.676 


0.415 


4.708 


5.531 


-13.457 


-7.932 


NG4 


5.729 


6.156 


5.149 


6.657 


-29.709 


5.259 


NG5 


6.487 


6.141 


5.309 


6.705 


-25.700 


5.923 


NG6 


6.399 


5.438 


5.258 


6.708 


-27.365 


5.219 


NG7 


6.903 


5.631 


5.319 


5.311 


-21.540 


5.984 


NG8 


6.904 


5.334 


5.027 


5.995 


-31.693 


5.566 


NG9 


5.580 


4.9357 


5.328 


5.054 


-24.666 


4.383 


R-Pearson 


0.195 


0.129 


0.174 


0.701 


0.488 


0.026 



Table 11. Predicted activity as computed for the non-Gaussian molecules of Table 4 with 
the models of Table 6 founded along the minimum paths of Table 9; for each predicted 
model, its correlation with the observed activity is indicated at the bottom of the table. 



Molecule\^ 


yLogP.H ^ 


yLogP.POL \ 
I HU J 


Y L H °u sP ' H ) 
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The results of correlation tests in Table 10 indicate the structure index-model activity hierarchy: 
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Somehow the influences of POL and H are reversed relative to the prescription by trial succession 
of Equation (17a), revealing hydrophobicity as the main influential factor. However, due to the fact 
that the predicted activities of POL in Table 10 are all in the "opposite evolution direction" with 
respect to the activities recorded in Table 4, i.e., they are all negative, the uni-parametric tests and their 
associated hierarchy (20) are discarded, and one looks toward the second class of QSAR and 
catastrophe algorithms. 

Instead, the test correlations of Table 11 provide the structure-activity ordering for the 
b i-parameter-mo dels 
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Remarkably, the hierarchy (21) starts with the QSAR model, which is revealed to be at the top of 
the validated catastrophe models with statistical performance even higher than through the predicted 
equation of Table 6 and the trial set of Table 4. Moreover, the QSAR-II model involves parameters 
(Log P & H) that are followed by the hyperbolic umbilic (HU) model in terms of (Log P & POL) 
parameters, in this way recovering the original mono-structural influences as anticipated by 
Equations (17a) and (18a). Thus, the series of models in Equation (21) is validated, and it will be 
further employed to establish the models' successions and the molecular structural pattern of inhibiting 
anti-HIV-1 drug resistance. To this end, apart from the first and last models of Equation (21), which 
are associated with the maximum (0.778) and minimum (0.057) test performance, the middle 
catastrophe models provide closely related performance in the range (0.431, 0.468). Their graphical 
3D-representation of the parametric domains Log P: (-1.50, 2.72), POL: (27.87, 38.48) and 
H: (-63.299, 17.808) of all (trial and test) structures in Table 4 are displayed in Figure 2. Next, it is 
apparent that they can be coupled according to the same spanned domains, thus forming the activity 

- 1 Y L F f- POL ) , I Y P ° L,H ) - 1 Yp° LiH ) , plotted in the top 



models' differences \Y^ ogP,H 



fLogP.H 
HU 



yLogP.POL 
HU 



of Figure 3. Through registering the parameters and the models' successions: 
[QSAR] LogPM > [HU] l08P ' POL > [EU] P0L ' H > [PU] 



(22) 



one may reach the following important conceptual-computational conclusions: 

• The HIV-1 inhibitory activity is triggered by a hydrophobic interaction followed by energetic 
stabilization of the ligand/substrate (pyrididone derivative/viral protein) interaction here 
modeled by the heat of molecular formation and eventually completed by the ionic field 
influence herein represented by the polarizability descriptor. 

• Although the QSAR multi-linear model should not be excluded from the molecular modeling 
of complex bio-chemical interactions, it should be complemented with other polynomial 
correlational catastrophe-type models that produce significant results comparable to those of 
other 3D-modeling procedures such as docking-based comparative molecular field analysis 
(CoMFA) and comparative molecular similarity indices analysis (CoMSIA) [24]. 

However, the issue remains of establishing the molecular structure most suitable for HIV-1 
inhibitory activity among the considered pool of pyridinone derivatives in Table 4. To this end, the 
representations in Figure 3 are synergistically employed to identify the molecular structural domains 
that optimally promote binding of the pyridine derivative to the hydrophobic pocket in the p66 subunit 
of HIV-1 through searching for joint fulfillment of the following structural parameters and inter-model 
evolutionary generic principles: 

• Log P: For positive values, the compound behaves hydrophobically and requires dissolution in 
an organic solvent; by contrast, for negative values the compound is hydrophilic and can be 
dissolved directly in an aqueous buffer. For Log P equal to 0, the compound partitions at a 1:1 
organic-to-aqueous phase ratio, meaning that it is likely soluble in both organic and aqueous 
solvents and in cellular environments; thus, values of Log P equal to or greater than zero are 
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selected to achieve hydrophobicity and suitability for the cellular environment [43,44], while 
characterizing the stacking bonding of aromatic rings [45]; 

• H: Because the formation of a compound from its elements usually is an exothermic process, 
most heats of formation are negative, and this is also a characteristic of the dynamic 
equilibrium of ligand-substrate interactions [46]; note that the advantage of using heat of 
formation as QSAR descriptor resides in the following: it thermodynamically relates with the 
free energy AG = —RT\n.K eq by the equilibrium constant K e which parallels the recorded 

activity at thermodynamic level [24]; it nevertheless expands the Gibbs free energy from the 
hydrogen to covalent bonding strength [45]; 

• PO: It is expected that "the natural direction of evolution of any system is towards a state of 
minimum polarizability" [47], while accounting for the dipolar interaction [45]; 

• Activity Models: Represent the same chemical-biological process providing their differences 
with respect to structural domains are minimized to zero. 



Figure 2. 3D-representations of the QSAR and catastrophe activities for the tested models 
of Table 1 1 in the range of the structural indicators (Log P, Pol, H) as abstracted from 
Table 4. 
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Figure 3. Determination of the structural domains of pyridinone-derivative type 
non-nucleoside reverse transcriptase inhibitors in the same range of structural descriptors 
by employing the principles of hydrophobicity, minimum polarizability, binding energy, 
and the minimum difference between the polynomial activity models of Figure 2; the 
hydrophobic pocket was identified in the p66 subunit of HTV-l-rt of specific transferase 
R221239 [48,49]. 



Hyperbolic Umbilic\Ellipt;ic Umbilic-A 



Elliptic Umbilic-A'vFarabolic Umbilic-B 




These principles are applied to the activity models' differences at the top of Figure 3, and they lead 
to the identification of the structural domain (and even points) characteristic of the pyridinone 
derivative most well-adapted to inhibiting the HIV-1 life cycle. The graphical results in Figure 3 
suggest that the ordering of the structural indicators is: 
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(23a) 



(23b) 



(23c) 



The "solution" of system (23) gives the actual molecules in Table 4 predicted to be the most potent 
binding inhibitors, namely compounds 27 (Log P ~ 2.72, H a -39.459 kcal/mol, POL a 35.55A 3 ), 
28 (Log P a 1.06, H a -34.478kcal/mol, POL a 34.88A 3 ), and 29 (Log P ~ 0.96, 
H a -21.361 kcal/mol, POL a 35.17A 3 ). Most impressively, these molecules were also predicted by 
the much more sophisticated methods of CoMFA and CoMSIA as having increased binding affinity 
between the aromatic ring (or wing 2 of the pyridinone derivative) and amino acid Tyrl81 of the first 
molecule and Tyrl88 of the last two. These two amino acids are very important in the inhibition of RT 
by NNRTIs because the most common mutations are Tyrl81Cys and Tyrl88Cys, and they are 
responsible for the emergence of viruses resistant to pyridinone derivatives. Therefore, designing 
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pyridinone compounds that allow aromatic ring stacking interactions with Tyrl81 and Tyr 188 may 
prevent these mutations and increase the activity of these anti-HIV drugs. 

Overall, the QSAR presented here combined with catastrophe polynomial structure activity 
relationships provides a reliable conceptual and computational tool for identifying the mechanisms 
underlying ligand-subtract interactions and the structural domains best able to promote them. 
Consequently, this method should be further integrated into automated data processing and tested on 
other complex open systems with bio- or eco-toxicological relevance, especially where evolutionary 
life-cycles are present. 

5. Conclusions 

One of the most challenging battlefields in metabolic virology focuses on the complete and 
sustained inhibition of the HIV life cycle at its various levels. Thus: "an ideal anti-HIV agent should 
stop the virus' progress and also the infection of healthy host cells, with no toxicity against normal cell 
physiology" [50]. Moreover, the ideal anti-HIV agent should avoid the drug-resistance phenomenon of 
HIV mutant variants. QSAR techniques are cost-effective computer-assisted drug design methods that 
can be used to obtain potential anti-HIV compounds with powerful biological effects and the lowest 
possible levels of side-effects and toxicity. 

As the predictive roles of modeling and quantitative-structure-activity relationships (QSAR) in 
medicinal chemistry and drug synthesis are now recognized [51,52], thereby corroborating recent 
intriguing reports on the modest performance of direct statistical multilinear correlations in genotoxic 
carcinogenesis modeling of covalent drug binding to DNA followed by mutagenesis [53], the present 
study advances the idea of non-linear polynomial fits of observed/experimentally available 
Activity = f(X l ,X 2 ) , with X\, X2 being structural physicochemical parameters (usually 

hydrophobicity, polarizability and/or forming heat energy in accordance with the basic recommendation 
of Hansch) [54] under the seven polynomial forms inspired by Thorn's catastrophe theory [1] 
(see Table 3). 

As an application of the emerging catastrophe-QSAR analysis to a recently reported set of 
pyridinone derivatives with non-nucleoside reverse transcriptase inhibitor activity, [24] all the modeling 
stages required by the OECD-QSAR principles [32] are implemented here in a synergistic 
manner, namely: 

(i) A defined endpoint: The hydrophobic binding of the inhibitor in the pocket of the p66 subunit 
of reverse-transcriptase was confirmed herein through the identification of hydrophobicity as 
the major influence among all the mono-nonlinear catastrophes employed; see Equation (17). 

(ii) An unambiguous algorithm: The Spectral-SAR minimum path principle [31,55-57] is here 
generalized to include relevant combination of statistical information (e.g., the correlation 
factor R, Student's *-test, Fischer's F-test) to provide an equal footing multi-dimensional Euler 
distance [see Equations (8-16)], thus avoiding the previously identified discrepancy in judging 
the mid-range performance in terms of correlation or other statistical factors [56]. 

(in) A defined domain of applicability: By performing linear vs. non-linear QSARs, the present 
strategy allows for the identification of recommended applicable structural domains through 
setting their difference to zero via inter-model activity minimization, which is equivalent to 
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assuring the "smoothness" of the inhibitor-protein binding evolution towards the final steric 
inhibition output. 

(iv) Appropriate measures of goodness-of-fit, robustness and predictivity: The trial results were 
evaluated by external validation employing a testing set, which was selected by means of 
Gaussian vs. non-Gaussian distributions of the compounds' activities, an improvement over the 
earlier arbitrariness of sampling the compounds only within a certain activity range. For 
instance, for linear QSAR the predicted correlation was superior to the tested correlation, thus 
confirming the reliability of this validation technique. 

(v) A mechanistic interpretation: The selected succession of catastrophe-QSARs indicates that the 
inhibitor-HIV protein binding mutations that are involved in "birth and death" processes are 
associated with "waves" of induced activity in certain structural domain variants (see Figure 2). 
Moreover, the flat QSAR hypersurface should be complemented with catastrophe analysis to 
determine the specific structural domains for optimum interactions (see Figure 3) and for the 
associated molecular structure design of NNRT inhibitors. 

Because the catastrophe-QSAR approach was found to successfully identify the molecular 
compounds with the most anti-HIV-1 potency as predicted by other 3D-QSAR methods, these results 
encourage further applications and implementations of Thorn's non-linear correlations with the goal of 
analytically modeling complex dynamic ligand-receptor interactions, especially on the molecular 
fragment or structural alert level [41], on a chemometric basis. 
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Appendix 

Al. More on Catastrophe Theory Background 

The foreground of the Catastrophe Theory lies on expressing the Taylor series associated to a 
smooth function r/{c,x), (c,x) = (c l ,...,c k ,x 1 ,...,x m ), say in its origin (c,x) = 0 under the form 

rj(c,x) = j s rj(c,x) + tayl (A.l) 
viewed as the summation of the so called s-jet or s-current 

j'f(c,x) = £-D r T]\ x r (A.2) 

and of its tail generically called here as "tayl". However, in modeling the natural phenomena, unlike 
the regular (like planets orbits) or continuous ones (with small perturbations included) where the 
truncation to the s-jet works fine, many of registered events display sudden (or "catastrophic") 
characters, like earthquakes, population growth, or cancer spreading, thus highly requiring for 
counting of the Taylor tail as well; such need was elegantly resumed by C. E. Zeeman, one of the 
pioneers of Catastrophe Theory [58], by "allowing the tail of the Taylor series to wag the dog". When 
the tayl part is becoming important it shapes as the quadratic type dependency on the control x behavior 
joint space where the original function was defined: 

tayl oc g 2 (c,x) (A3) 

This is due to the celebrated Morse's bifurcation lemma [59] around the so called critical points of 
the original function, see Equation (3) of the main text, where it actually equivalents the original 
function with the family of function 

Tj(c,x)^fj{g l (c,x),...,g s {c,x))±gl 1 +... + g 2 m (A.4) 

Here s-stays also the co-rank of the Hessian of r/(c,x) in the point (c,x)= 0. The main question 
that arises hereby is to try to identify the so called local types of function in a ^-parametric (control 
space) family of functions, or, even more, being given a function to identify in its neighborhood the 
family it belongs to. The solution to this problem was furnished by Thorn [1] and then by Arnold [60], 
by using the powerful concepts of co-dimension and structural tranversality, such that the resulted 
classification theorem formulates the seven elementary so called catastrophe function of Table 2 as 
governing all the natural phenomena where the co-dimension is no greater than 4 (four). To better 
understand that this is indeed covering quite general plethora of natural dynamic systems (with 
complicated local/turning/singular points modeling sudden changes), enough recalling the heuristic 
example of the co-dimension for England-Scotland frontier, for instance, that is always equal to 1 
(one) no mater one represents the frontier as a line (the road along it), as bidimensional (the road 
through it on the Earth), as tridimensional (the road through it by plain), or as 4-D (in relativistic 
vision when the space-time cone is considered as well along it) [61]. It is this co-dimension that 
controls so powerfully the reduction of all possible power expansions of smooth functions to those 
seven presented on 

Table 2; there, one sees the co-dimension number is always equal with the number of parameters from 
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the control space appearing in the Thorn polynomials; they, in fact, represent families of functions, i.e., 
controlling large classes of functions that drive open systems in similar (local) ways. In taxonomical 
(or algebraically) terms, it is said that although not all functions are typical (or elementary) their 
families are typical as families; In analytically terms, as all minima through origin look the same (there 
are said to be typical, and typical like the Morse minima of generalized parabola x\ + ... + x 2 m , 

eventually after re-parameterization) likewise any transverse path through any non-Morse function that 
can be found within a family of finite functions looks the same as all other transverse paths in the 
family (those of Table 2). Even more, the co-rank of those functions (as the co-rank of their Hessian on 
the critical/singular/turning points) fixes also the minimum of variables that function can be reduced 
to; for example, if a function of 2011 variables has a critical point of co-rank equal 1, the actual 
function to be studied is of only 1 variable! This makes the Catastrophe Theory extremely interesting 
for being applied on QSAR studies, where the available structural variables are listed on hundred 
pages [62], while in fact one searches for modeling functions that enter natural classes or family of 
functions with an universal character — as the Thorn polynomials are — and therefore aiming to work 
with appropriate functions with considerable lower number of variables/structural descriptors, 
see Table 3. 

A2. Catastrophe Theory Implication on Pearson Correlation 

Since the transformation of the original smooth function into catastrophe one involves the Morse 
parabolic polynomials contribution, see Equation (A.4), one may employ this recipe to consider the 
ordinary QSAR predicted activity, say Y QSAR , and of its transformation into the Catastrophe-QSAR 
one, say y rlQSAR , through the Gaussian mapping 

Y^(X„...X M )= Y^{X„...X M )± Y^ R (X 2 ,..X M ) (A .5) 



with 

X 

exp -- 

7=2 V 7t 



M 1 

Yi ^(X 2 ,..X M ) = j: J= exp 



4f7 ^ j 



(A.6) 



while referring to the running-indices assumed in Table 1. The form (A. 5) with (A.6) recovers the 
original QSAR predicted function/value when all dispersions over all structural variables vanish 

Y^(X X ,...X M )^^Y^(X 1 ,..X M ) (A.7) 

thus motivating the actual generalization for treating the natural non-zero dispersive phenomena. On 
the other side, for higher dispersive values of structural variables (i.e., when their domains of 
applicability eventually overlap and promote interactions, i.e., the appearance of cross products in 
Tables 2 and 3) it produces the second order development 

Y^ R {X„..X M ) ^ > 7g(X 1 ,...Xj±X 2 2 ±...±Xi (A.8) 
under appropriate transformations X — = X . — [X — I . However, one can see that in the 

r j=2,M j=2,M \ j=l,M I ' 

Catastrophe Theory's language the first function of the right hand side in (A.8) stays for the 1-jet for 
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the function Y QSAR , while the hole expression (A.8) having the Hessian co-rank of order 2 is in full 
consistence with the maximum co-rank universal unfolding for the polynomials of Table 2. 

Next, one likes to check for the effect the Gaussian development (aka the catastrophe transformation) 
of (A. 5) has on the statistical (Pearson) statistical coefficient respecting the QSAR value 



* 0 = Ji--I(4-^) 2 , o A = ± (4 - aJ (A .9) 

For the sake of clarity we will chose only one sign on (A.5), while the result will not depend on it, 
and successively obtain 

^=Ji--z(4-r e ^) 2 

_ I £ )_ ^ ^yMiriQSAR ^ + 2l_ ^ ^ _ y8 SAR ITIQSAR ^ 

V 0 °A '=1 ' ^ '=1 



(A 10) 



where in the last relation the Cauchy-Schwarz inequality was used: 



2>,v,sl>NIX (A.ii) 



i=i V i=i V 1=1 



Next, in order to draw results that do not depend either on M-the number of structure variables nor 
on iV-the number of chemicals/molecules involved in a custom QSAR study, one assumes dealing with 
the same dispersion of the observed activity as well as for each descriptor (the so called homogeneous 
assumption, a = a A = cr., V/ = 1,7V) likely to be valid when dealing with great number of structural 

descriptors; this way, one actually performs the asymptotic limit M — » oo on (A.6) for all i = l,N 
and recognizes the Poison integral result 

f v2\ 



rco/r/QSAR 

o V^r 



00 1 I X i 
= f — 7=exp dX = 4& 



v 4a 



(A12) 



Accordingly, the inequality (A. 10) now reads 

Rl lr <Rl-N + 2^N(\-Rl) (A.13) 

It may be rearranged upon the second order equation in TV-chemicals' space 

-N + 2^(\-R 2 0 )jN + (R 2 Q -Rl /r )>0 (A. 14) 

whose universal fulfillment leads with the condition 

R x/r >\ (A15) 
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Since the result (A. 15) was obtained within asymptotic conditions regarding the number of 
structural descriptors and homogeneous dispersion against recorded activity, it can be naturally 
asserted to its minimum as 

R M,r^> l ^V R o (A.16) 

thus heuristically proving the superiority for the catastrophe-QSAR modeling over the fashioned QSAR, 
therefore further motivating the present approach. As numerical illustration of the general prescription 
of inequality (A.16) the present application confirms it by all one-to-one (i.e., catastrophe-QSAR vs. 
simple QSAR) results reported in Tables 5 and 6. 
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